ACL - 05 Computational Approaches to Semitic Languages
نویسنده
چکیده
We explore the application of memorybased learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis – the construction of all possible analyses of isolated unvoweled wordforms – is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character changes, and vocalization. Part-of-speech tagging is carried out by a bi-modular tagger that has a subtagger for known words and one for unknown words. We report on the performance of the morphological analyzer and part-of-speech tagger. We observe that the tagger, which has an accuracy of 91.9% on new data, can be used to select the appropriate morphological analysis of words in context at a precision of 64.0 and a recall of 89.7.
منابع مشابه
Can You Tag the Modal? You Should
Computational linguistics methods are typically first developed and tested in English. When applied to other languages, assumptions from English data are often applied to the target language. One of the most common such assumptions is that a “standard” part-of-speech (POS) tagset can be used across languages with only slight variations. We discuss in this paper a specific issue related to the d...
متن کاملIntegrated Morphological and Syntactic Disambiguation for Modern Hebrew
Current parsing models are not immediately applicable for languages that exhibit strong interaction between morphology and syntax, e.g., Modern Hebrew (MH), Arabic and other Semitic languages. This work represents a first attempt at modeling morphological-syntactic interaction in a generative probabilistic framework to allow for MH parsing. We show that morphological information selected in tan...
متن کاملBayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East.
The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which g...
متن کاملModifying a Natural Language Processing System for European Languages to Treat Arabic in Information Processing and Information Retrieval Applications
The goal of many natural language processing platforms is to be able to someday correctly treat all languages. Each new language, especially one from a new language family, provokes some modification and design changes. Here we present the changes that we had to introduce into our platform designed for European languages in order to handle a Semitic language. Treatment of Arabic was successfull...
متن کاملA Treebank of Ugaritic. Annotating Fragmentary Attested Languages
The paper presents an outline of a treebank of Ugaritic, an extinct Semitic language. It describes the basic structure of the treebank, and possibility of re-using approaches applied to other Semitic languages. It also discusses problems of analyzing a language attested in a fragmentary form and possible usage of a treebank based approaches for further reconstruction of text passages.
متن کامل